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SEQUENTIAL CHANGE DETECTION REVISITED 

By George V. Moustakides 

University of Patras 

In sequential change detection, existing performance measures 
differ significantly in the way they treat the time of change. By mod- 
eling this quantity as a random time, we introduce a general frame- 
work capable of capturing and better understanding most well-known 
criteria and also propose new ones. For a specific new criterion that 
constitutes an extension to Lorden's performance measure, we offer 
the optimum structure for detecting a change in the constant drift 
of a Brownian motion and a formula for the corresponding optimum 
performance. 

1. Introduction. Suppose we are observing sequentially a process {(,t}t>o, 
which up to and including time r > follows the probability measure Poo and 
after r it switches to an alternative regime Pq. Parameter r is the change- 
time and denotes the last time instant the process is under the nominal 
regime Pqo- The goal is to detect the change of measures as soon as possible, 
using a sequential scheme. 

Any sequential test can be modeled as a stopping time (s.t.) T adapted 
to the filtration {J^t}t>o, where J-t = (t{^s,0 < s < t} for t > 0; and J^q is 
the trivial cr-algebra. We note that the process {^t} becomes available for 
t > while the change-time r can take upon the value as well. This is 
because with r = we would like to capture the case where all observations 
are under the alternative regime, whereas r = oo refers to the case where 
all observations are under the nominal regime. More generally, Pr denotes 
the probability measure induced by the change occurring at r and !£,-[•] the 
corresponding expectation. In particular, if X is an J^oo-measurable random 
variable and t = t a deterministic time of change, then we can write 

(1.1) Et[X]=E^[Eo[X\J^t]]- 

In developing optimum change detection algorithms, the first step con- 
sists in defining a suitable performance measure. Existing criteria basically 
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quantify the detection delay (T — r)"*", where x"*" = max{a;,0}, by considering 
alternative versions of its average. These definitions play an important role 
in the underlying mathematical model for the change-time r. 

Currently we distinguish two major models for change-time. The first, in- 
troduced by Shiryayev (1978), assumes that r is random with a known (expo- 
nential) prior. We can accompany this change-time model with the following 
performance measure Js{T) = ET-[r — r|T > r], that is, the average detection 
delay conditioned on the event that we stop after the change. Alternatively, 
we can consider r to be deterministic and unknown and follow a worst-case 
scenario. There exist two possibilities. The first, proposed by Lorden (1971), 
considers the worst average delay Jh(T) = supQ<^essupET-[(r — t)+|.7v] con- 
ditioned on the least-favorable observations before the change. The second, 
due to Pollak (1985), uses the worst average delay J-p{T) = supo<T- IEt-[T — 
t\T > t\ conditioned on the event that we stop after r. 

Shiryayev's Bayesian approach presents definite analytical advantages and 
has been the favorite underlying model in several existing optimality results 
as Poor (1998), Beibel (1996), Peskir and Shiryayev (2002), Karatzas (2003), 
Bayraktar and Dayanik (2006) and Bayraktar, Dayanik and Karatzas (2006). 
The two deterministic approaches on the other hand, although more ana- 
lytically involved, are clearly more tractable from a practical point of view 
since they do not make any limiting assumptions. 

As it will become evident in Section 3, the three performance criteria can 
be ordered as follows: Js{T) < J'p{T) < Ji^(T). Because of this property, 
there exist strong arguments against Lorden's measure as being overly pes- 
simistic. Such claims, however, tend to be inconsistent with the fact that 
Ji^{T), whenever it can be optimized, it gives rise to the CUSUM s.t., one 
of the most widely used change detection schemes in practice. Despite their 
similarity, Pollak's J-p{T) and Lorden's J'l{T) measure, as we are going to 
see, differ in a very essential way. In fact J'p{T), although not obvious at 
this point, will be shown to be closer to Shiryayev's Js{T) measure than to 
Lorden's Jl(T). 

In the next section we present a general approach for modeling the change- 
time T. The three measures presented previously will turn out to be special 
cases of our general setting corresponding to different levels and forms of 
prior knowledge. The understanding of their differences will give rise to a 
discussion concerning the suitability of each measure for the problem of 
interest and will explain, we believe in a convincing way, why Lorden's cri- 
terion, although seemingly more pessimistic than the other two, is more 
appropriate for the majority of change detection problems. Finally, we are 
going to introduce an additional criterion that constitutes an extension to 
Lorden's J'l{T) performance measure. For this case, we will also provide 
the optimum test for detecting a change in the constant drift of a Brownian 
motion and a formula for the corresponding optimum performance. 
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2. A randomized change-time. Suppose that nature, at every time in- 
stant t, consults the available information J^t and with some randomization 
probability decides whether it should continue using the nominal probability 
measure or switch to the alternative one. Consequently, let nt denote the 
randomization probability that there is a change at t conditioned on the 
available information up to time t, that is vrj = P[t = t\J^t]- Clearly, vrj is 
nonnegative and the process {vrt} is {^f}-adapted. 

We recall that time r is usually considered in the literature as the first 
time instant under the alternative regime. With the current setting this is no 
longer possible. Indeed, since there is a decision involved whether to change 
the statistics or not, this decision must be made before any data under the 
alternative regime are produced. Therefore, r denotes the time we stop using 
the nominal regime. 

Consider now a process {Xt}t>o, where Xt is nonnegative and .Foo-measurable 
(the process in not necessarily {.T^tj-adapted). We would like to compute the 
expectation of the random variable which is the r-randomly-stopped ver- 
sion of {Xf}, but we are interested only in finite values of r. In other words 
we would like to find E,-[AfT-|T < oo]. Using (1.1) and that nt is .Fj-measurable, 
we can write 



oo 



IEr[^rl{.<oo}] = J2^t[Xtn] =J2Eoo[Eo[Xt\J^t]7^t]. 
t=0 t=0 

Substituting Xt = lin the previous relation, we obtain ¥t-[t < oo] = X^t^o^oo 
which is an expression for the probability of stopping at finite time. Com- 
bining the two outcomes leads to 

j:Zo'Eoom^t\:Ft]7rt] 



E^r^^lr < oo] 



Et=oIEooN 



^t=o ' 

From now on, and without loss of generality, we make the simplifying as- 
sumption that ¥t-[t < oo] = 1 (otherwise divide each vr^ with Pr[T < oo]). 
Under this assumption, we have 

oo 

(2.1) Er[Xr]=J2^oc[M^t\:Ft]7rt]. 

t=0 

Let us summarize our change-time model. We are given a time increasing 
information (filtration) {J-t}t>o with J^q being the trivial ci-algebra, and a 
sequence of {.Fjj-adapted probabilities {vrj}. Quantity vr^ denotes the his- 
tory dependent randomization probability that t is the last time instant 
we obtain information under the nominal probability Poo and at the next 
time instant the new information will follow the alternative measure Pq. For 
a process {Xt} with Xt being nonnegative and .7-"oo-measurable, we define 
the expectation of the r-randomly-stopped process Xr with respect to the 
measure induced by the change, with the help of (2.1). 
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2.1. Decomposition of the change-time statistics. The process {vr^} can 
be decomposed as vr^ = wtpt where {wt} is a deterministic sequence of prob- 
abihties defined as 

oo 

(2.2) ujt = EooK] therefore ^ rot = [r < c»] = 1; 

t=o 

and {pt} a nonnegative {.Ft}-adapted process defined for tut > as 

(2.3) = — therefore Eoo[pt] = 1, 

VJt 

while for rut = 0, we can arbitrarily set pt = 1. Quantity zut expresses the 
aggregate probabihty that r will stop at t, whereas pt describes how this 
probability is distributed among the possible events that can occur up to 
time t. Since J-'q is the trivial c-algebra, ttq is deterministic, therefore wq = ttq 
and po = l. Clearly vjq expresses the probability that the change takes place 
before the statistician obtains any information. 

3. Performance measure and optimization criterion. If T is an {J-^t}- 
adapted s.t. used by the statistician to detect the change, then we are inter- 
ested in defining a measure that quantifies its performance. Following the 
idea of Lorden (1971) and Pollak (1985), we propose the use of 

J{T)=Er[T-T\T>T], 

namely, the average detection delay conditioned on the event that we stop 
after r. Of course this measure makes sense for finite values of r because a 
change at infinity is regarded as "no change." Since (T — t)^ and l^x>t} 
nonnegative and .Foo-measurable, by using (2.1) our measure can be written 
as 

^ Er=oKoo[nMiT-t)+\^t]] 

(3.1) 

'Et^O'^t^oc>[ptt{T>t}] 

If we are interested in finding an optimum T, then we must minimize 
J{T) with respect to T, controlling at the same time the rate of false alarms. 
Similarly to Lorden (1971) and Pollak (1985), we propose the following con- 
strained optimization with respect to T: 

(3.2) infJ'(r) subject to Eoo[T]>7. 

In other words, we minimize the conditional average detection delay, subject 
to the constraint that the average period between false alarms is no less 
than a given value 7 > 0. The performance measure, as we can see from 
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(3.1), requires complete knowledge of the two processes {ro^} and {pt}- In 
the next subsection we extend our definition to include cases where the 
statistics of r are not exactly known or they are limited to special cases. 

3.1. Special cases and uncertainty classes. If {■tut},{pt} are not known 
exactly and instead we have available an uncertainty class T for r, then we 
can extend the definition of our performance measure by adopting a worst- 
case approach of the form sup^g^j7(T), while (3.2) can be replaced by the 
following min-max constrained optimization problem: 

(3.3) infsup^(T) subject to Eoo[r]>7. 

Next, we are going to identify the particular form of our criterion for specific 
change-time classes. In order to facilitate our presentation, we first introduce 
a technical lemma. 



Lemma 1. Let {wt} and {pt} he the processes defined in Section 2.1 
satisfying (2.2) and (2.3), respectively. If{at}, {bt} are two nonnegative 
deterministic sequences then 

(3.4) sup =sup-, 

{zut} l^t=0^tOt 0<t bt 

where, for at = bt = we define the ratio at/bt = 0. Furthermore, if xt,yt are 
two nonnegative and J^t -immeasurable random variables then 

Eoo[ptXt] Xt 

(3.5 sup— r = essup— , 

Pt EooiPtytl yt 

where, as before, when xt = yt = we define the ratio xt/yt = 0. 

Proof. To prove (3.4) notice that since at < {supo<((at/6t)}^t con- 
clude that for any sequence {zut} we have 

Eoo 
t=o'^tat . at 

I^t=o^t0t o<t bt 

The upper bound in (3.6) is attainable by a sequence {tUf} that places all 
its probability mass on the time instant(s) that attain the supremum. If 
the supremum is attained in the limit, then for every e > we can find a 
sequence {tut} that depends on e, such that the left-hand side in (3.6) is e 
close to the right-hand side. 

Similar arguments apply for the proof of (3.5). Notice that, for every 
Pt>0 satisfying Eqo [pt] = 1 the combination Eqo \pt ■] defines a probability 



6 



G. V. MOUSTAKIDES 



measure on which is absolutely continuous with respect to Poo- Since 
xt < {essup(xj/yf)}yt, Poo-a.s., this leads to 

— — r < essup — . 

^ociptytl yt 

The upper bound is attainable by a probability measure Eoolpt •] that places 
all its mass on the event (s) that attain the essup, or we use limiting argu- 
ments if the essup is attained as a limit. □ 



Let us now proceed with the presentation of specific special cases and 
uncertainty classes regarding the two processes {wt},{pt}- 



Case of known wt and pt = l. Here, by selecting pt = l,we limit our gen- 
eral change-time model to the case where the probability that the change will 
occur at t is independent from the observed history Tt- The corresponding 
performance measure simplifies to the following expression 

\^-^) Js[^ ) - J )\pt=i - ^oo \r^^■^^ ' 

where we used (1.1) to replace Eoo[IEo[(T - t)+|J^t]] with Et[(r-t)+]. There 
is no uncertainty class involved, we have simply limited the change-time r 
to this special case. We recall that Shiryayev (1978) first introduced this 
model for the particular selection vjt = (1 — 6)6^ . 



Case of arbitrary wt and pt = 1. We continue using the same model of 
the previous case, but now we let {zut} be an arbitrary sequence of probabil- 
ities satisfying, according to (2.2), J2t^o^t = 1- Using (3.4) from Lemma 1 
and (1.1), it is straightforward to prove that 

(3.8) Jp{T) = sup J{T)\p^=i = supEt[T - t\T > t]. 

{zut} 0<t 

By considering arbitrary {zut}, we recover PoUak's performance measure. 
From the way J-p{T) is defined, it is evident that Js{T) < Jp{T). 

Regarding the minimization of Jp{T) with respect to the s.t. T, Pollak 
(1985) proposed the solution of the constrained optimization problem in 
(3.3). As candidate optimum s.t. for i.i.d. observations he suggested the 
Shiryayev-Roberts stopping rule. Pollak was able to demonstrate asymptotic 
optimality (as 7 — > oo) for this test. Regarding nonasymptotic optimality of 
the Shiryayev-Robert s.t. with respect to this criterion, see Mei (2006). 



CHANGE DETECTION REVISITED 



7 



Case of arbitrary wt and arbitrary pt- Here the probability to stop at 
time t depends on the observed history J^t, we thus return to our general 
change-time model, but we assume complete lack of knowledge for the change 
time probabilities. In order to find the worst-case performance, we need to 
maximize J^{T) with respect to both processes {■uJt} and {pt}- We have the 
following lemma that treats this problem. 

Lemma 2. Let {wt} and {pt} be defined as in Section 2.1 satisfying 
(2.2) and (2.3) respectively, then 

(3.9) Jl{T)= sup J(r)=supessupEt[(T-t)+|^t]. 

Wt},{pt} o<t 

Proof. Using (3.4) from Lemma 1, for any given sequence {pt} we have 

sup.7(r)=s.p "-^ff;-');i^''l . 

{zut} 0<t iii.oo[Ptii-{T>t}\ 

Using the fact that we can change the order of two consecutive maximiza- 
tions, we have 

Eoo[ptM{T-t)+\J^t]] 
sup J^(r) = supsup ■ 



{pt},{^t} {pt} 0<t IEoo[ptl{T>t}J 

¥.^\ptM{T-t)+\rt 



= supsup ^ I ^ 1 

o<t vt ^oo[Pti{r>t}J 

= supessupEo[(T — t)+|.7^t] 

= supessupEt[(r — t)^\J^t], 

where for the third equality we used (3.5) from Lemma 1 and for the last 
equality the fact that Eo[-|.7^t] =Ef[-|.Ft]. This concludes the proof. □ 



Here we recover Lorden's performance measure. It is clear that J^p{T) < 
Ji^(T), since for Lorden's measure we maximize over {pt} while in J-p{T) we 
consider pt = 1. As it was demonstrated in Moustakides (1986) and Ritov 
(1990), solving the optimization problem in (3.3) for Lorden's criterion and 
for i.i.d. observations, gives rise to the CUSUM test proposed by Page (1954). 
It is interesting to mention that Ritov (1990) based his proof of optimality 
on a change-time formulation, similar to the one proposed here. 

A slight variation of the previous uncertainty class consists in assuming 
that the change cannot occur outside a sequence {tn}n>o of known time 
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instants. In other words, we have vjt = ii t ^ {tQ,ti, . . .}. This modifies the 
previous criterion in the following way 

(3.10) Jel(T)= sup J(r)=supessupEt„[(r-t„)+|JiJ. 

With a more accurate description of the time instants where the change can 
occur, one might expect to improve detection as compared to the CUSUM 
test. This measure is presented for the first time and will be treated in detail 
and under a more interesting frame in Section 4. 

It is also possible to examine, under the general model, the case where 
{tUt} is known and {pt} unknown or, alternatively, {wt} unknown and 
{pt} known. Clearly, the first case could be regarded as an extension of 
Shiryayev's approach to the general change-time model proposed here. Un- 
fortunately both cases lead to rather complicated performance criteria, we, 
therefore, omit the corresponding analysis. 

Discussion. From the preceding presentation it is evident that the three 
performance measures are ordered in the following way: 

Js{T)<Jp{T)<JUT), 

giving the impression that Lorden's criterion is more pessimistic than Shiryayev's 
and Pollak's. This conclusion, however, is misleading since the underlying 
change-time model for the Shiryayev and Pollak criterion is completely dif- 
ferent and significantly more limited than Lorden's. We recall that 
and J-p{T) rely on the assumption that the change at time t is triggered 
with a probability that does not depend on the observed history Tt ■ In prac- 
tice there are clearly applications where this assumption is false and where 
it is more realistic to assume that the observations supply at least some 
partial information about the events that can trigger the change. Therefore, 
whenever we adopt this logic, Lorden's performance measure becomes more 
suitable than Shiryayev's and Pollak's. The same way i7p(T) is preferable to 
J'^{T) when there is no prior knowledge of {vot} [despite the fact that J-p{T) 
is more "pessimistic" than ^^{T)]^ we can also argue that Jh{T) is prefer- 
able to and J^p{T) for problems where we need to follow the general 
change-time model and there is no prior knowledge regarding the change- 
point mechanism. Even if we still insist that Jh{T) is overly pessimistic, it 
has now become clear that and i7p(T) are not the right alternatives, 
since they correspond to a drastically different change-time model. 

Our previous arguments also suggest a word of caution when evaluating 
or comparing performances through Monte Carlo simulations. Selecting the 
time of change in an arbitrary way that has no relation with the observation 
sequence, is equivalent to adopting the restrictive change-time model with 
Pt = 1. This in turn is expected to favor tests that rely on this specific 
selection. 
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4. Change at observable random times. Let us now attempt a different 
parametrization of the change-time r. Suppose that in addition to the pro- 
cess {Ct}t>o we also observe a strictly increasing sequence of random times 
{Tn},n = 1,2,.... These times correspond to occurrences of random events 
that can trigger the change of measures. In other words, we make the as- 
sumption that the change can occur only at the observable time instants 
{t„}. We would like to emphasize that we consider the flow of the obser- 
vation sequence {^t} to be continuous and not synchronized in any sense 
with the random times {t„}. It is, therefore, clear that detection can be 
performed at any time instant, that is, even between occurrences. 

There are interesting applications that can be modeled with this setup. 
For example, earthquake damage detection in structures, where earthquakes 
occurring at (observable) random times can trigger a change (damage), while 
detection is performed by continuously acquiring vibration measurements 
from the structure. Similar application is the detection of a change in fi- 
nancial data after "major importance events" or, as reported by Rodionov 
and Overland (2005), detection of regime shifts in sea ecosystems due to 
(observable) changes in the climate system. 

Let us now relate our problem to the change-time model introduced in 
the previous section. Consider the strictly increasing sequence of occurrence 

times {r„}, n = 1,2, Since we assume that observations are available 

after time 0, it is clear that ti > 0, therefore, we arbitrarily include tq = 
into our sequence. Notice that tq does not necessarily correspond to a real 
occurrence. This term is needed to account for the case where the change 
took place before any observation was taken. If Mt denotes the number of 
observed occurrences up to (and including) time t, that is. 



then we can define our filtration {Tt} as = o"{^s,A4, < s < t} and To 
to be the trivial fi-algebra. With this filtration the random times Tn are 
transformed immediately into s.t. adapted to {Tt} (since by consulting the 
history J^t we can directly deduce whether < i is true) . 
The probability ttj takes now the special form 



where vf^ is -measurable. As we can see, the resulting vr^ is nonzero only 
if we have an occurrence at t. 

By decomposing 7f„ = uJnPn with J2'?^=o'^n = 1 and Eoo[pn.] = 1, we can 
define the equivalent of all performance measures introduced in Section 3.1. 
We limit our presentation to Lorden's measure since this is the case we are 



Aft = sup{n ■.Tn<t} 

0<n 




n=0 



n=0 
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going to treat in detail. If we use the last equation for vrj in (3.1), we obtain 
the following form for our performance measure J{T): 

^,rj.. ^ T.n=^^n^oo\PnM{T - Tn)+|.^.J] 

Assuming no prior knowledge for and {pn}) we have to maximize 

J{T) in (4.1) with respect to the two processes. This leads to the following 
extended Lorden measure: 

(4.2) JEL(r)= sup J(r) = supessupE,J(r-r„)+|.7^,J. 

{TOn},{pn} 0<n 

The difference with the previous definition of JEh{T) in (3.10), is that the 
time instants r„ are now s.t. instead of deterministic times. 



4.1. Detection of a change in the constant drift of a Brownian motion. 
Although it is possible to analyze the problem of detecting a change in the 
pdf of i.i.d. observations, we prefer to consider the continuous time alterna- 
tive of detecting a change in the constant drift of a BM. This is because the 
corresponding solution is more elegant, offering formulas for the optimum 
performance and therefore allowing for direct comparison with the classical 
CUSUM test. Thus, let us assume that the observation process {^t} is a 
BM satisfying = fJ-it — t)+ + wt, where wt a standard Wiener process and 
fi a known constant drift. For the change-time r we assume that it can be 
equal to any t„ from the observable sequence of s.t. {t„}. Finally for the 
occurrence times {Tn}, we assume that they are Poisson distributed with a 
constant rate A and independent from the observation process {£,t}- 

We recall that the problem of detecting a change in the drift of a BM 
has been considered with the classical Lorden measure (where occurrences 
are not taken into account, therefore the change is assumed to happen at 
any time instant) by Shiryayev (1996) and Beibel (1996) and under a more 
general framework by Moustakides (2004). 

If we denote with ut the log-likelihood ratio between the two probability 
measures, then ut = — 0.5 fi'^t + fi^f Let us consider the following process 
{nit}: 

mo = 0; mj = mo A ( inf Ur„) = [ inf ) , 

\l<n<Aft J \l<n<Aft J 

where x~ = min{x, 0}. Notice that rut starts from and becomes the running 
minimum of the process {ut} but updated only at the occurrence times. We 
can now define the extended CUSUM (ECUSUM) process as follows: 



yt = ut- mt 




and the corresponding ECUSUM s.t. with threshold > as 

Su = M{t:yt > u]. 

As opposed to the CUSUM process which is always nonnegative, the ECUSUM 
process yt can take upon negative values as well. 

Figure 1 depicts an example of the paths of {ut\ and {mt}. Process {nit} 
is piece- wise constant with right continuous paths and can exhibit jumps 
at the occurrence instances {t„}. The ECUSUM process {yt}, being the 
difference of {ut} (which is continuous) and {mt}, is also right continuous 
with continuous paths between occurrences. From Figure 1, we can also 
deduce that {yt} exhibits a jump at only if yr„_ < in which case 
becomes 0. This can be written as 

(4.3) y,„ = (y,_) + . 

For technical reason, it is also necessary to introduce a version of ECUSUM 
which can start from any value yo = y (as compared to the regular ver- 
sion which starts at yQ = 0). For this we simply have to assume that ut = 
y — 0.5fi'^t + fiS^t, while mt,yt and the s.t. are defined as before. To distinguish 
this new version of the s.t. from the regular one let us denote it as S^. It is 
then clear that Si, = Si, for y = 0. Since inter-occurrence times are i.i.d. and 
independent from the past, the process {yr^} is Markov and Si, given that 
yo = y has the same statistics as {Si, — Tn)^ given that yr^ = y and S^ > Tn- 

4.2. Performance evaluation of the ECUSUM test. In this subsection, 
we are going to obtain a formula for the expectation of S,^. We first present 
a lemma that states an important property for this quantity. 
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Lemma 3. Let the occurrences be Poisson distributed with rate X, then 
the average ¥.[Su] is decreasing in yQ = y and for every y we have 

(4.4) E[5,|yo = y]<^+IE[5.]. 

Proof. The paths of {yt} are increasing in yo = y-, therefore Sy is de- 
creasing in y and so is IE[5j/]; consequently for y > we have E[5,y|yo = 2/] < 
E[5,y|yo = 0] = E[5j/]. Assume now that y <0, then since 5jy < ri + (5jy — ti)+, 
by taking expectation we can write 

nS,\yo = y]< E[ri] +E[(cS, - Ti)+|yo = y] 

= E[ri] + E[(cS, - Ti)+|yo = y,S,> n]¥[S, > n\yo = y] 

< E[ri] + E[(4 - Ti)+|yo = y, 4 > n] 

= E[ri] +E[E[(cS, - n)+\yr„Sy > n]\yo = y] 

< E[ri] + supE[(cS, - n)+\yr, =z,S,> n] 

= i + supE[cS^|yo = ^;] 

= i + E[cS,|yo = 0] 
= j + E[S,]. 

Where we have used the property that (S^ — ti)+ conditioned on the event 
that Sy > Ti and y^-^ = y, has the same statistics as Si, given yo = y and, 
furthermore, that at an occurrence the ECUSUM statistics is nonnegative. 
This concludes the proof. □ 

From Lemma 3 we deduce that E[5,y|yo = y] is decreasing and uniformly 
bounded in y. Let us now proceed with the computation of E[cS^|yo = y]- We 
have the following theorem that provides the desired formula. 



Theorem 1. Let ut = y + at + bwt with {wt} a standard Wiener pro- 
cess with Wo = and a,b^O. Define the ECUSUM s.t. Sy as above, with 
the occurrences being Poisson distributed with rate \, then for y < v the 
expectation of is given by the following expression: 



1 



■[-y + v + A{e-'^y^l^" - e-2Wfc' )] , y > q, 
(4.5) E[5,|yo = ?/] = <; f o /.2 1 

(X A 
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where 



, + Va2 + 2A62 



62 ' 2a V A 



Proof. Denote with /(y) the function in the right-hand side of (4.5) 
which, as we can verify, is twice continuously differentiable, strictly decreas- 
ing in y and uniformly bounded for —oo<y<v. Consider now the difference 
f{yt) - fiuo), we can then write 

Aft 

fiVt) - /(yo) = fiVt) - fiVru,) + Y^[f{yrr,) - /(2/r„_i)] 

n=l 

M 

= f{yt) - fiVr^J + Y^[fiyT„^) - fiVr^-i)] 
n=l 

Aft 
n=l 

where we used the fact that {yt} is right continuous. In the time interval 
[Tn-i,Tn), the proccss yt has continuous paths and mt is constant, therefore 
using Ito calculus we can write 

J [ys~){uuii -\- uaws) -\- u.ot'^ 
If t is not an occurrence, a similar expression holds for the time interval 



is not an occurrence, 
[ta/j ) • This suggests that 



AJt 

fivt) - fiyT^rJ + J2^f{yT„-) - /(yr„_i)] 

n=l 



: [\af'{ys-)+0.5b^f"iys-)]ds+ T 6/'(y,_) du;,. 
Jo Jo 



The sum involving the jumps, using (4.3), can be written as 

E[/(yrJ - /(y.„.)] = E[/((y.„J+) - /(y.„J] 

n=l n=l 

''\f{{ys-)'-)-fiys-)]dMs. 



(4.6) 



Combining the two expressions leads to 

fiyt)-fiyo)= f\af'iys^) + 0.5b^f{ys^)]ds+ fhf'{y,^)dws 
Jo Jo 

+ l\f{[ys-)^)-f{ys-)]dMs. 
Jo 
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For any integer n let 5" = An. Then we know that for a process {cjj} 
which is a {^t}-adapted and uniformly bounded in the sense that \u;t\ < c < 

oo, we have from Protter (2004) that E[/o " u;^_ cW^] = K[Jo w^-Ads] and 
from Karatzas and Shreve (1988) that E[/q " u>s_ dwg] = 0. Replacing t with 
the s.t. 5" in (4.6), taking expectation and using the fact that f{y^) — f{y) 
and f'{y) are uniformly bounded for y £ (— oo,!'], allows us to write 

nf{ysn)]-fiyo) 







{afiyt-) + O.bb'f'iyt^) + X[f{{yt-)^) - f{yt-)]}dt 



It is straightforward to verify that the function f{y) is a solution to the 
differential equation 

(4.7) afiy) + 0.5b^f"iy) + A[/(y+) - /(y)] = -1, -oo < y < z.. 
This, if substituted in the previous expression, yields 

/(yo)-E[/(y5„)]=E[4-]. 

Letting now n ^ oo, we have 5" — > monotonically. In the previous equal- 
ity, using monotone convergence on the right-hand side and bounded con- 
vergence [since f{y) is uniformly bounded] on the left, we obtain 

f{yo)-E[fiy^J]=E[S,]. 

At the time of stopping the process {yt} hits the threshold v (see Figure 1), 
therefore, we have y^^^ = z/, suggesting that E[/(?/^^)] = /(i^). We can now 

verify that /{u) = 0, which yields f{yo) = K[Si,] and completes the proof. 
□ 



Remark 1. One might wonder, why is (4.5) the desired formula and 
not any other solution of the differential equation in (4.7) that satisfies 
the boundary condition f{i') = 0? It turns out that among the solutions of 
(4.7) that are twice continuously differentiable (property needed to apply Ito 
calculus) and satisfy the boundary condition f{u) = 0, the formula in (4.5) 
is the unique solution which is uniformly bounded in (— oo,i^] (property 
imposed by Lemma 3). 



By letting A — > oo and setting y = in (4.5), we recover the average run 
length of the classical CUSUM test as obtained in Taylor (1975). If we denote 
by Quiy), hu{y) the average of Si, under Pq and Poo respectively, then under 
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Pq we have ut = y — 0.5/i i + ^^f = y + 0.5 fL t + fiwt, therefore by substituting 
a = 0.5^^, b = fi in (4.5), we can write 

^[-y + „ + Ao{e-y - e-^], y>0, 
9iy{y) =Eo[Su\yo = y] = i % i 

— [i, + Aoil-e--)] + j[l-e^oy^, y<0, 

where 

'^o = 

2 

Similarly substituting a = 



hy{y) = Eoo[Su\yQ = y] 
where 




y<o, 




To compute the performance of the regular ECUSUM s.t. we must set 
y = in the previous formulas. It is then clear that 5,^(0) expresses the 
(worst) average detection delay and hi,{{)) the average period between false 
alarms for S^. Specifically, after noticing that r^r^y^ = 2A//i^, we have 

(4.8) (0) = Eo [5,] = ^ ( [i/ - 1 + e-^ + — ( 1 - e--^) j , 

(4.9) K{0) = Eoo[5,] = A/[e- - - 1] + l(e- - 1)1 

^J' [. ro j 

where the first term in both right hand side expressions corresponds to the 
performance of the classical CUSUM test (obtained by letting A ^ oo). 

Figure 2 depicts the normalized average detection delay y?gy{0)/2 as a 
function of the normalized average false alarm period /x^/ijy(0)/2, for dif- 
ferent values of the ratio /x^/A. We observe that, in the average, ECUSUM 
detects the change faster than CUSUM. Of course this is not surprising since 
ECUSUM has available more information than CUSUM (CUSUM does not 
observe the occurrences). We can also see that the performance difference 
between the two schemes, for given value of /U^/A, is uniformly bounded by 
a constant. Finally, we conclude that the gain obtained by using ECUSUM 
instead of CUSUM becomes significant only for large values of the parame- 
ter /x^/A or, equivalently, when the occurrences that can trigger the change 
are very infrequent. 
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Fig. 2. Normalized average detection delay of ECUSUM as a function of normalized 
average false alarm period, for different values of the parameter /i^/A. 

5. Optimality of ECUSUM. Using the formula in (4.9) for the average 
period between false alarms, we can relate the threshold u to the false alarm 
constraint parameter 7 through the equation 



The left-hand side of the last equality is increasing in v and for = it is 
equal to 0, also for ^ 00 it tends to 00, we can, therefore, conclude that 
for given 7 > 0, the last equation has a unique solution which we call v^,. 
Since Vi, is the solution to the previous equation it is clear that 



Using as threshold we can now define the corresponding ECUSUM s.t. 
Sp^. Our goal in the sequel is to demonstrate that this test is optimum in 
the extended Lorden sense. We recall that the occurrence times are Poisson 
distributed with a known constant rate A. Observe however that A enters 
only in the correspondence between the threshold z^* and the constraint 7 
without affecting the ECUSUM test otherwise. 

Consider the functions g^^ (y) , h^^, (y) associated with . Both functions 
will play a key role in our proof of optimality. The next lemma presents an 
important property for each function which is an immediate consequence of 
Theorem 1. 



/iv(0) = 4([e"-'^-l] + -(e'-l)| 



= 7. 




V(0)=7- 
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Lemma 4. IfT is a s.t. and Si^ the regular ECUSUM s.t. with threshold 
V , define Ty = T f\ S^, then 

(5.2) ¥.^[KM-huAyn)]=^oo[Tul 

(5.3) E,J{5.,(?/.J -5..(yTj}l{T.>..}|.^rJ =]E,J(r, -r„)+|.^.J. 

With the next theorem we provide a suitable lower bound for the extended 
Lorden measure. First, we introduce a technical lemma. 

Lemma 5. Let T he a s.t. and define = T AS^, then 

Proof. Following similar steps as in the proof for Theorem 1, if f{y) 
is a twice continuously differentiable function with f{y^) — f{y) and f'{y) 
uniformly bounded for y < z/, we have 

Eoo[/(yTj]-/(yo) 



{-0.5/iV'(yt-) + G.hii'f"{yt-) + \[f{{y 



2 fill 



f{yt-)\]dt 



For f{y) = e^ and recalling that we treat the regular ECUSUM case with 
yo = 0, we immediately obtain that 



Eoo[e 



1 = E. 



x{ey^ 



(1 



ey'-)ds 
-)+ds 



> 0. 



This concludes the proof. □ 



(5.4) 



Theorem 2. For any s.t. T let = T AS^, then 



Jel(T)> 5.^(0) 



Proof. The proof follows similar steps as in Theorem 2, Moustakides 
(2004). Since T > T^, it is clear that Jel{T) > JeUTu). Also from the defi- 
nition of i7el(') in (4.2) we conclude 

(5.5) Jel(T)> JEL(r^)>Er„[(r^-r„)+|j;j, n = 0,l,2,.... 

Using (5.3) from Lemma 4, the previous inequality for n > 1 can be written 
as 



E. 



{9u. iyr„ ) - 9u. {yn ) } 1{T, >r„ } I 



18 



G. V. MOUSTAKIDES 



Multiplying both sides with the nonnegative quantity (1 — e*"^" jlij-^^^^j. 
and taking expectation with respect to ¥^0 yields 



(5.6) 



>Eoo[e"-^-"-(l-e'"--'"^"-0{9..(y.J-<7..(yrJ}l{T.>.„}]. 



Now notice that 1 — e™'^" ™^n-i jg different from 0, only when there is a 
jump in nit at t„, in which case yr„ = and m^-^ = Ur„- This means that 

E^[e«T.--. (1 - e--— -n-OK.(?/.J - 5..(yTj}l{T.>.„}] 

= Eoo[e"-^-'"- (1 - e"^--™-- (0) - 9..(2/rJ}l{T.>.„}] 
= Eoo[e"-^(e-"^- -e-"^-"-i)K.(0) - <7..(yrJ}l{T.>.„}]. 
Furthermore, since Eooie"^"^""^" l-^r„] = 1 we can write 
Eoo[(l - e™--'"-"-)l{T.>.„}] =Eoo[e"-^-"-(l - e"^-""^-- )1{T.>.„}] 

= Eoo[e"^''-™-(l - e'"^"-"^^"-i)l{T.>r„}] 
= Eoo[e"-^ (e— - e-'"-"-! >.„}]• 
Substituting the two equalities in (5.6) and summing over all n > 1 we have 



(5.7) 



n=l 

00 



> ^Eoo[e"-''(e— -e-"^-"-i){5..(0) -5..(yTj}l{T.>.„}]. 

n=l 

In the second sum, interchanging summation and expectation, yields 

00 

^Eo,[e"-^(e— -e-"^--i){5..(0) -5..(?/tJ}1{t.>.„}] 



n=l 



:E, 



{9..(0)-<7..(yTj}e"-''E(e" 



e j 



n=l 



= Eoo[{5..(0) - 9..(yTj}e"-'' (e— - 1)]. 

Similarly for the first sum we have 

00 

Y: Eoo[e"-^ (e-'"- - e-™-"-i )1{t.>.„}] 



n=l 



^Eoo[e"^^(e-'"^'' -1)]. 



CHANGE DETECTION REVISITED 19 

Substituting the two expressions in (5.7) we obtain 

JELmEoote^^-le-"*^— 1)] 

(5.8) 

>E„o[K,(0) - <7..(yTj}e"-^(e— - 1)]. 

There is one last inequahty we have not used so far from (5.5), namely for 
n = 0. Recalling that tq = 0, this inequality takes the form 

(5.9) JEL(T)>Eo[r,]. 
Using (5.3) from Lemma 4, we get 

Eo[T,]=Eob..(0)-9..(yTj] 

= Eoo[{9..(0)-<7..(yTj}e"^^]. 
Also since Eooie"^"] = 1, (5.9) is equivalent to 

JEL(r)Eoo[e"^^] > Eoo[{5..(0) - 9..(2/Tj}e"^-]. 
If this is added to (5.8), we obtain 

JEL(r)Eoo [e"-" -'"^^ ] > Eoo [{5.. (0) - 9u. {yT^}e^^^ ] , 

or 

(5.10) JEL(T)Eo,[e3'^-] > g,Smoo[ey^''] " Eocb.. (yTje^^--]. 

Since < v, thanks also to Lemma 5, we have e'^ > Eoofe^^"] > 1. We 
can thus divide both sides of (5.10) with Eooie^^"] and obtain the desired 
expression. □ 

We will base our proof of optimality of S,^^ on Theorem 2. Let us first 
introduce an additional technical lemma. 

Lemma 6. IfTisas.t. andT^ = T f\Sy then the function =E,oo[Tu] 
is continuous and increasing in v with V'(O) = and 'ip{oo) = Moo[T]. 

Proof. The proof is exactly similar as in Lemma 3, Moustakides (2004). 
Consider k > z/ > 0, then 

0<T^-n<S^-S^, 

from which we obtain 

0<^{K)-^{u)<h^{0)-h^{0). 

From (4.9) we have that the function h^{0) is continuous in i'. If we use this 
property in the previous relation, we deduce that is also continuous. 
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Finally, we can directly verify that the two hmiting values V'(O), V^loo) are 
correct. □ 

An immediate consequence of the previous lemma is the fact that if 
lEcx)[r] > 7 then we can find a threshold v such that Eoo[2v] = 7. Since 
<^EL(r) > J'EL^v) this suggests that for the proof of optimality of Sy^ we 
can limit ourselves to s.t. that satisfy the false alarm constraint with equal- 



Theorem 3. If a s.t. T satisfies Koo[T] = 7 then it possesses an extended 
Lorden measure Jel{T) that is no less than gu^,{0) = i7el (5,/^ ) . 

Proof. From Eooi?"] = 7, thanks to Lemma 6, for every e > we can 
find a threshold such that for T^^ = T A S^^ we have 



Recalling from (5.1) that hy^{Q) = 7 and using (5.11) in the previous equality, 
we obtain 



Define the function p{y) = eygi,^{y) — {ro/roo)hi,^{y) and consider the 
derivative p'{y). By direct substitution we can verify that eygl^{y) = 
{''"o/^oo)h'^^iy), from which we deduce that 



Since gy^{y) is strictly decreasing in y and also satisfies gv^^i^i,) = 0, this 
suggests that p'{y) has the same sign as u^, — y, or that p{y) has a global 
maximum at y = v^. Because p{ui,) = this means that p{y) < which yields 
lEoo[p(yT^. )] < 0. Using this inequality and replacing p{y) by its definition, 
we obtain 



ity. 




(5.12) 



e>IEooK(yT.J]>0. 



p'{y) = ey9uAy)- 



> [p{yn^ )] = eJ'^^. g,, (yr., ) - —K^ (yr., ) 
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From Theorem 2, using (5.13) and Lemma 5, we can now write 

:^el(T) > ,..(0) ^-p,-^ > ,..(0) - -e. 

Since e is arbitrary this means that J-el^) > S'l/^lO), thus estabhshing op- 
timahty of ECUSUM. □ 
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